towards an even better backward attention kernel #179

ngc92 · 2024-04-19T10:42:38Z

Back to the drawing board, because I think the other kernels hit a local minimum, or at least the way the loops where organized made it very difficult to think about how to optimize this further.

I think there is quite a bit of room to optimize this version further, but for educational purposes, I think it is better to have a simpler version first, where the main idea is evident, and then go crazy with the optimizations in a second version.

This design limits T to multiples of the block_size. Incidentally, on my system at least, block size 64 turns out to be fastest, which actually is compatible with both our training at test scripts.

karpathy · 2024-04-19T15:59:06Z

dev/cuda/attention_backward.cu

@@ -560,6 +561,128 @@ __global__ void softmax_autoregressive_backward_kernel5(float* __restrict__ dpre
    }
 }

+
+// I want `BlockSize` to be statically known to the compiler, thus we get a template here.


love this comment block

…sum!!!!

ngc92 force-pushed the even-better-attention branch from 5f1b7a3 to 51c6d8a Compare April 19, 2024 15:35

karpathy reviewed Apr 19, 2024

View reviewed changes

ngc92 added 4 commits April 19, 2024 19:12

towards an even better backward attention kernel

16123c3

further simplification of the loop

02aa8e4

don't do stupid redundant work; the inner loop is just a block-level …

4294820

…sum!!!!

further cleanup and ability to handle arbitrary sequence lengths

50e105a

ngc92 force-pushed the even-better-attention branch from d8308d6 to 50e105a Compare April 19, 2024 16:13

karpathy merged commit b556ad9 into karpathy:master Apr 19, 2024

ngc92 deleted the even-better-attention branch April 28, 2024 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

towards an even better backward attention kernel #179

towards an even better backward attention kernel #179

ngc92 commented Apr 19, 2024

karpathy Apr 19, 2024

towards an even better backward attention kernel #179

towards an even better backward attention kernel #179

Conversation

ngc92 commented Apr 19, 2024

karpathy Apr 19, 2024

Choose a reason for hiding this comment